Mielke Distribution (mielke, Mielke Beta-Kappa / Dagum)#

The Mielke Beta-Kappa distribution (often called the Dagum distribution) is a flexible family on positive real values with polynomial (heavy) tails.

It is a good default when:

  • your data are strictly positive (precipitation amounts, income/wealth, claim sizes, waiting times)

  • you need a model that can be very right-skewed with power-law tails

  • you want simple sampling (closed-form inverse CDF)


Learning goals#

By the end you should be able to:

  • write the PDF/CDF/quantile function and interpret the parameters

  • compute moments (and know when they do not exist)

  • derive expectation/variance and the likelihood

  • sample from mielke using inverse-transform sampling (NumPy-only)

  • fit and use the distribution via scipy.stats.mielke

Notation#

  • Shape parameters: \(k > 0\), \(s > 0\)

  • Random variable: \(X \sim \mathrm{Mielke}(k, s)\) (standard form)

  • Standard support: \(x > 0\)


Table of contents#

  1. Title & Classification

  2. Intuition & Motivation

  3. Formal Definition

  4. Moments & Properties

  5. Parameter Interpretation

  6. Derivations

  7. Sampling & Simulation

  8. Visualization

  9. SciPy Integration

  10. Statistical Use Cases

  11. Pitfalls

  12. Summary

import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio

import scipy
from scipy import stats
from scipy.special import gammaln, psi, logsumexp

pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)

rng = np.random.default_rng(42)

print("numpy:", np.__version__)
print("scipy:", scipy.__version__)
print("plotly:", plotly.__version__)
numpy: 1.26.2
scipy: 1.15.0
plotly: 6.5.2

1) Title & Classification#

  • Name: mielke (Mielke Beta-Kappa; also known as the Dagum distribution)

  • Type: continuous

  • Standard support: \(x > 0\) (SciPy defines the PDF for \(x \ge 0\); if \(k<1\) the density diverges as \(x\to 0^+\))

  • Parameter space (standard form): \(k > 0\), \(s > 0\)

  • Location/scale form (SciPy): \(X = \mathrm{loc} + \mathrm{scale}\cdot Y\) with \(Y \sim \mathrm{Mielke}(k, s)\)

    • Support becomes \(x > \mathrm{loc}\)

    • \(\mathrm{scale} > 0\)

2) Intuition & Motivation#

What it models#

mielke is a positive, right-skewed, heavy-tailed model. It is useful when:

  • small values occur frequently (controlled by \(k\))

  • rare, extremely large values occur (tail heaviness controlled by \(s\))

  • a lognormal/gamma tail is too light

The survival function decays like a power law:

\[ \Pr(X > x) = 1 - F(x) \sim \frac{k}{s}\,x^{-s}\quad \text{as } x\to\infty. \]

So \(s\) behaves like a tail index.

Typical real-world use cases#

  • Precipitation / hydrology (Mielke introduced this family for rainfall amounts)

  • Income/wealth modeling (Dagum distribution in economics)

  • Insurance claim sizes and other positive heavy-tailed costs

  • Reliability / lifetime modeling when failures can occur very late

Relations to other distributions#

  • Burr Type III / Dagum: mielke(k, s) is exactly SciPy’s burr(c=s, d=k/s).

  • Log-logistic (fisk): the constraint \(k=s\) gives $\(f(x)=\frac{s\,x^{s-1}}{(1+x^s)^2},\)$ which is the log-logistic (fisk) density.

  • Beta-prime connection: if \(Y = X^s\), then $\(f_Y(y)=\frac{k}{s}\,\frac{y^{k/s-1}}{(1+y)^{1+k/s}},\quad y>0,\)\( so \)Y\( is **Beta-prime** with parameters \)(k/s,,1)$.

  • Beta connection: if \(U = \frac{X^s}{1+X^s}\), then \(U\in(0,1)\) and $\(U \sim \mathrm{Beta}(k/s, 1).\)$ This gives an immediate sampler.

3) Formal Definition#

CDF#

In the standard (unshifted, unit-scale) parameterization, for \(x>0\):

\[ F(x; k, s) = \frac{x^k}{(1+x^s)^{k/s}} = \left(\frac{x^s}{1+x^s}\right)^{k/s} = \left(1 + x^{-s}\right)^{-k/s}, \qquad k>0,\ s>0. \]

PDF#

Differentiating the CDF gives the density:

\[ f(x; k, s) = \frac{k\,x^{k-1}}{(1+x^s)^{1+k/s}},\qquad x>0. \]

Quantile function (inverse CDF)#

Let \(p\in(0,1)\). Solving \(p = (1 + x^{-s})^{-k/s}\) gives:

\[ Q(p) = \left(\frac{p^{s/k}}{1 - p^{s/k}}\right)^{1/s}. \]

Location/scale#

SciPy uses the standard location/scale convention:

\[ X = \mathrm{loc} + \mathrm{scale}\cdot Y,\qquad Y\sim\mathrm{Mielke}(k,s),\ \mathrm{scale}>0. \]

Then \(X\) has support \(x>\mathrm{loc}\) and

\[ f_X(x)=\frac{1}{\mathrm{scale}}\,f_Y\!\left(\frac{x-\mathrm{loc}}{\mathrm{scale}}\right). \]
def mielke_logpdf(x, k, s):
    '''Log-PDF of the standard Mielke (Beta-Kappa / Dagum) distribution (loc=0, scale=1).'''
    x = np.asarray(x, dtype=float)
    k = float(k)
    s = float(s)

    out = np.full_like(x, -np.inf, dtype=float)
    if (k <= 0) or (s <= 0):
        return out

    mask = x > 0
    xm = x[mask]
    logx = np.log(xm)

    # log(1 + x^s) computed stably as log(1 + exp(s log x))
    log1p_xs = np.logaddexp(0.0, s * logx)

    out[mask] = np.log(k) + (k - 1.0) * logx - (1.0 + k / s) * log1p_xs
    return out


def mielke_pdf(x, k, s):
    return np.exp(mielke_logpdf(x, k, s))


def mielke_logcdf(x, k, s):
    '''Log-CDF of the standard Mielke distribution.'''
    x = np.asarray(x, dtype=float)
    k = float(k)
    s = float(s)

    out = np.full_like(x, -np.inf, dtype=float)
    if (k <= 0) or (s <= 0):
        return out

    mask = x > 0
    xm = x[mask]
    logx = np.log(xm)

    # Use F(x) = (1 + x^{-s})^{-k/s} to avoid cancellation when x is large.
    log1p_xnegs = np.logaddexp(0.0, -s * logx)  # log(1 + x^{-s})
    out[mask] = -(k / s) * log1p_xnegs
    return out


def mielke_cdf(x, k, s):
    x = np.asarray(x, dtype=float)
    out = np.zeros_like(x, dtype=float)
    mask = x > 0
    out[mask] = np.exp(mielke_logcdf(x[mask], k, s))
    return out


def mielke_ppf(p, k, s):
    '''Quantile function Q(p) for p in [0,1].'''
    p = np.asarray(p, dtype=float)
    k = float(k)
    s = float(s)

    x = np.full_like(p, np.nan, dtype=float)
    if (k <= 0) or (s <= 0):
        return x

    x[p == 0] = 0.0
    x[p == 1] = np.inf

    mask = (p > 0) & (p < 1)
    # Let a = (s/k) log p, so q = p^{s/k} = exp(a) in (0,1).
    a = (s / k) * np.log(p[mask])

    # q/(1-q) = exp(a) / (1-exp(a)) = exp(a) / (-expm1(a)) (stable when a≈0).
    log_ratio = a - np.log(-np.expm1(a))
    x[mask] = np.exp(log_ratio / s)
    return x


def mielke_rvs_numpy(k, s, size, rng=None):
    '''NumPy-only sampler via inverse-transform sampling.'''
    rng = np.random.default_rng() if rng is None else rng
    u = rng.random(size)
    return mielke_ppf(u, k, s)


def mielke_raw_moment(n, k, s):
    '''Raw moment E[X^n] for -k < n < s; returns +inf when the moment diverges.'''
    k = float(k)
    s = float(s)
    n = float(n)

    if (k <= 0) or (s <= 0):
        return np.nan

    # Integrability:
    # - near 0: f(x) ~ k x^{k-1} so E[X^n] finite iff n > -k
    # - in the tail: f(x) ~ k x^{-s-1} so E[X^n] finite iff n < s
    if not (-k < n < s):
        return np.inf

    return np.exp(gammaln((k + n) / s) + gammaln(1.0 - n / s) - gammaln(k / s))


def mielke_entropy(k, s):
    '''Differential entropy of the standard Mielke distribution.'''
    k = float(k)
    s = float(s)
    if (k <= 0) or (s <= 0):
        return np.nan

    D = psi(1.0 + k / s) - psi(1.0)
    return -np.log(k) + (k - 1.0) / k + (1.0 + 1.0 / s) * D


def mielke_summary_stats(k, s):
    '''Mean/variance/skewness/excess kurtosis (when finite), else nan/inf.'''
    k = float(k)
    s = float(s)

    mean = mielke_raw_moment(1.0, k, s) if s > 1 else np.inf
    if s <= 2:
        return mean, np.inf, np.nan, np.nan

    m2 = mielke_raw_moment(2.0, k, s)
    var = m2 - mean**2

    skew = np.nan
    exkurt = np.nan

    if s > 3:
        m3 = mielke_raw_moment(3.0, k, s)
        mu3 = m3 - 3.0 * m2 * mean + 2.0 * mean**3
        skew = mu3 / (var ** 1.5)

    if s > 4:
        m3 = mielke_raw_moment(3.0, k, s)  # defined since s>4
        m4 = mielke_raw_moment(4.0, k, s)
        mu4 = m4 - 4.0 * m3 * mean + 6.0 * m2 * mean**2 - 3.0 * mean**4
        exkurt = mu4 / (var**2) - 3.0

    return mean, var, skew, exkurt

4) Moments & Properties#

Existence of moments (key takeaway)#

The right tail is polynomial:

\[ \Pr(X > x) \sim \frac{k}{s}\,x^{-s}. \]

So positive moments satisfy:

  • \(\mathbb{E}[X^n] < \infty\) iff \(n < s\).

Near 0, \(f(x)\approx k x^{k-1}\), so negative moments exist iff \(n > -k\).

Overall:

\[ \mathbb{E}[X^n] < \infty\quad \Longleftrightarrow\quad -k < n < s. \]

Raw moments#

For \(-k < n < s\):

\[ \mathbb{E}[X^n] = \frac{\Gamma\left(\tfrac{k+n}{s}\right)\,\Gamma\left(1-\tfrac{n}{s}\right)}{\Gamma\left(\tfrac{k}{s}\right)}. \]

Mean and variance#

  • Mean (exists for \(s>1\)): $\(\mathbb{E}[X] = \frac{\Gamma\left(\tfrac{k+1}{s}\right)\,\Gamma\left(1-\tfrac{1}{s}\right)}{\Gamma\left(\tfrac{k}{s}\right)}\)$

  • Second moment exists for \(s>2\): $\(\mathbb{E}[X^2] = \frac{\Gamma\left(\tfrac{k+2}{s}\right)\,\Gamma\left(1-\tfrac{2}{s}\right)}{\Gamma\left(\tfrac{k}{s}\right)}\)$

  • Variance (exists for \(s>2\)): $\(\mathrm{Var}(X)=\mathbb{E}[X^2] - \mathbb{E}[X]^2\)$

Skewness and kurtosis#

  • Skewness exists for \(s>3\)

  • Excess kurtosis exists for \(s>4\)

You can compute them from raw moments \(m_n = \mathbb{E}[X^n]\) via standard formulas.

MGF / characteristic function#

  • The MGF \(M(t)=\mathbb{E}[e^{tX}]\) diverges for any \(t>0\) because the tail is polynomial.

  • The characteristic function \(\varphi(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\), but does not simplify to elementary functions in general.

Entropy#

The differential entropy has a closed form in terms of the digamma function \(\psi\):

\[ H(X) = -\log k + \frac{k-1}{k} + \left(1+\frac{1}{s}\right)\bigl(\psi(1+k/s) - \psi(1)\bigr). \]
k0, s0 = 2.5, 1.7
x_test = np.array([0.2, 0.5, 1.0, 2.0, 5.0])

pdf_np = mielke_pdf(x_test, k0, s0)
pdf_sp = stats.mielke.pdf(x_test, k0, s0)
print("max |pdf_numpy - pdf_scipy|:", np.max(np.abs(pdf_np - pdf_sp)))

# Relationship to Burr III / Dagum in SciPy: burr(c=s, d=k/s)
pdf_burr = stats.burr.pdf(x_test, s0, k0 / s0)
print("max |mielke - burr(c=s, d=k/s)|:", np.max(np.abs(pdf_sp - pdf_burr)))

mean, var, skew, exkurt = mielke_summary_stats(k0, s0)
mean_sp, var_sp, skew_sp, exkurt_sp = stats.mielke.stats(k0, s0, moments="mvsk")
print("mean:", mean, "(scipy:", float(mean_sp), ")")
print("var:", var, "(scipy:", float(var_sp), ")")
print("skew:", skew, "(scipy:", float(skew_sp), ")")
print("excess kurtosis:", exkurt, "(scipy:", float(exkurt_sp), ")")

h = mielke_entropy(k0, s0)
print("entropy:", h, "(scipy:", float(stats.mielke.entropy(k0, s0)), ")")
max |pdf_numpy - pdf_scipy|: 5.551115123125783e-17
max |mielke - burr(c=s, d=k/s)|: 1.6653345369377348e-16
mean: 2.495423340949614 (scipy: 2.4954233409496127 )
var: inf (scipy: inf )
skew: nan (scipy: nan )
excess kurtosis: nan (scipy: nan )
entropy: 1.6941719835054267 (scipy: 1.6941719835053937 )

5) Parameter Interpretation#

The parameters \(k\) and \(s\) both affect shape, but in different ways.

\(s\) (tail index)#

From the PDF, as \(x\to\infty\):

\[ f(x) = \frac{k x^{k-1}}{(1+x^s)^{1+k/s}} \sim k\,x^{-s-1}. \]

So:

  • larger \(s\) means a lighter right tail

  • the \(n\)-th moment exists iff \(n < s\) (mean requires \(s>1\), variance requires \(s>2\))

\(k\) (behavior near 0)#

As \(x\to 0^+\), \((1+x^s)^{-(1+k/s)}\to 1\), so

\[ f(x) \sim k\,x^{k-1}. \]
  • If \(k<1\), the density diverges at 0.

  • If \(k=1\), the density is finite and nonzero at 0.

  • If \(k>1\), the density goes to 0 at 0.

A convenient derived quantity is \(k/s\) (it appears in the CDF and quantiles). For example,

\[ \mathrm{median}(X) = Q(0.5) = \left(\frac{0.5^{s/k}}{1-0.5^{s/k}}\right)^{1/s}. \]
x = np.logspace(-3, 3, 900)

# Effect of changing k (near-zero behavior)
s_fixed = 2.5
k_list = [0.5, 1.0, 3.0]

fig_pdf_k = go.Figure()
for k in k_list:
    y = np.maximum(mielke_pdf(x, k, s_fixed), 1e-300)
    fig_pdf_k.add_trace(go.Scatter(x=x, y=y, mode="lines", name=f"k={k}, s={s_fixed}"))

fig_pdf_k.update_layout(title="PDF shape when varying k (s fixed)")
fig_pdf_k.update_xaxes(type="log", title="x")
fig_pdf_k.update_yaxes(type="log", title="pdf(x)")
fig_pdf_k.show()

# Effect of changing s (tail heaviness)
k_fixed = 2.5
s_list = [0.9, 2.0, 5.0]

fig_pdf_s = go.Figure()
for s in s_list:
    y = np.maximum(mielke_pdf(x, k_fixed, s), 1e-300)
    fig_pdf_s.add_trace(go.Scatter(x=x, y=y, mode="lines", name=f"k={k_fixed}, s={s}"))

fig_pdf_s.update_layout(title="PDF shape when varying s (k fixed)")
fig_pdf_s.update_xaxes(type="log", title="x")
fig_pdf_s.update_yaxes(type="log", title="pdf(x)")
fig_pdf_s.show()

6) Derivations#

Expectation (raw moments)#

Start from the raw moment definition (standard form):

\[ \mathbb{E}[X^n] = \int_0^\infty x^n\,f(x; k,s)\,dx = \int_0^\infty x^n\,\frac{k x^{k-1}}{(1+x^s)^{1+k/s}}\,dx. \]

Combine powers and substitute \(y=x^s\):

  • \(x=y^{1/s}\)

  • \(dx = \frac{1}{s}y^{1/s-1}\,dy\)

Then

\[ \mathbb{E}[X^n] = \frac{k}{s}\int_0^\infty \frac{y^{(k+n)/s - 1}}{(1+y)^{1+k/s}}\,dy. \]

Recognize the Beta-function identity

\[ \int_0^\infty \frac{y^{a-1}}{(1+y)^{a+b}}\,dy = B(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)},\qquad a,b>0. \]

Here \(a=(k+n)/s\) and \(b=1-n/s\). This requires \(a>0\) (i.e. \(n>-k\)) and \(b>0\) (i.e. \(n<s\)). Plugging in and using \(\Gamma(1+k/s)=(k/s)\Gamma(k/s)\) yields:

\[ \mathbb{E}[X^n] = \frac{\Gamma\left(\tfrac{k+n}{s}\right)\Gamma\left(1-\tfrac{n}{s}\right)}{\Gamma\left(\tfrac{k}{s}\right)}. \]

Variance#

When \(s>2\):

\[ \mathrm{Var}(X)=\mathbb{E}[X^2]-\mathbb{E}[X]^2, \]

where \(\mathbb{E}[X]\) and \(\mathbb{E}[X^2]\) use the raw-moment formula above.

Likelihood#

Given i.i.d. data \(x_1,\dots,x_n\) with \(x_i>0\), the log-likelihood (standard form) is:

\[ \ell(k,s) = \sum_{i=1}^n \log f(x_i; k,s) = n\log k + (k-1)\sum_{i=1}^n \log x_i - \left(1+\frac{k}{s}\right)\sum_{i=1}^n \log(1+x_i^s). \]

There is no closed-form MLE; in practice you maximize \(\ell(k,s)\) numerically (SciPy’s fit does this for you).

def mielke_loglik(k, s, x):
    x = np.asarray(x, dtype=float)
    if (k <= 0) or (s <= 0) or np.any(x <= 0):
        return -np.inf
    return float(np.sum(mielke_logpdf(x, k, s)))


# quick sanity check: log-likelihood is higher near the true parameters (on average)
k_true, s_true = 2.5, 3.0
x_data = mielke_rvs_numpy(k_true, s_true, size=2500, rng=rng)

for (k_try, s_try) in [(1.8, 3.0), (2.5, 3.0), (3.2, 3.0), (2.5, 2.0), (2.5, 4.0)]:
    print((k_try, s_try), mielke_loglik(k_try, s_try, x_data))
(1.8, 3.0) -2317.0192153765065
(2.5, 3.0) -2192.250620713471
(3.2, 3.0) -2271.5919986517115
(2.5, 2.0) -2430.591198727745
(2.5, 4.0) -2332.061147185981

7) Sampling & Simulation (NumPy-only)#

Because the CDF is available in closed form, sampling is straightforward via inverse-transform sampling.

Algorithm#

  1. Draw \(U\sim\mathrm{Uniform}(0,1)\).

  2. Return \(X = Q(U)\) where $\( Q(p) = \left(\frac{p^{s/k}}{1 - p^{s/k}}\right)^{1/s}. \)$

Equivalently (via the Beta connection):

  • draw \(V = U^{s/k}\) so that \(V\sim\mathrm{Beta}(k/s,1)\)

  • set \(X = \left(\frac{V}{1-V}\right)^{1/s}\).

The implementation above (mielke_rvs_numpy) uses the quantile function.

k_samp, s_samp = 2.5, 3.0
samples = mielke_rvs_numpy(k_samp, s_samp, size=50_000, rng=rng)

qs = np.array([0.1, 0.5, 0.9, 0.99])
q_emp = np.quantile(samples, qs)
q_theory = mielke_ppf(qs, k_samp, s_samp)

print("Quantiles p:", qs)
print("Empirical:", q_emp)
print("Theory:", q_theory)

print("\nSample mean/var (finite here since s=3>2):")
mean_theory = mielke_raw_moment(1, k_samp, s_samp)
var_theory = mielke_raw_moment(2, k_samp, s_samp) - mean_theory**2
print("mean:", samples.mean(), "(theory:", mean_theory, ")")
print("var:", samples.var(), "(theory:", var_theory, ")")
Quantiles p: [0.1  0.5  0.9  0.99]
Empirical: [0.409355 0.921199 1.945194 4.387239]
Theory: [0.406851 0.916873 1.950439 4.351833]

Sample mean/var (finite here since s=3>2):
mean: 1.1137956021290585 (theory: 1.1129126745223055 )
var: 0.8749178323085646 (theory: 0.8646985368757909 )

8) Visualization#

We’ll visualize:

  • the theoretical PDF and CDF

  • Monte Carlo samples (histogram + PDF overlay)

  • empirical CDF vs theoretical CDF

k_vis, s_vis = 2.5, 3.0
x_grid = np.logspace(-3, 3, 900)

# PDF
fig_pdf = go.Figure()
fig_pdf.add_trace(
    go.Scatter(
        x=x_grid,
        y=np.maximum(mielke_pdf(x_grid, k_vis, s_vis), 1e-300),
        mode="lines",
        name="pdf",
    )
)
fig_pdf.update_layout(title=f"Mielke PDF (k={k_vis}, s={s_vis})")
fig_pdf.update_xaxes(type="log", title="x")
fig_pdf.update_yaxes(type="log", title="pdf(x)")
fig_pdf.show()

# CDF
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=x_grid, y=mielke_cdf(x_grid, k_vis, s_vis), mode="lines", name="cdf"))
fig_cdf.update_layout(title=f"Mielke CDF (k={k_vis}, s={s_vis})")
fig_cdf.update_xaxes(type="log", title="x")
fig_cdf.update_yaxes(title="cdf(x)")
fig_cdf.show()

# Monte Carlo samples: histogram + PDF overlay
samples_vis = mielke_rvs_numpy(k_vis, s_vis, size=30_000, rng=rng)
fig_hist = px.histogram(
    samples_vis,
    nbins=80,
    histnorm="probability density",
    log_x=True,
    opacity=0.55,
    title=f"Monte Carlo histogram vs PDF (k={k_vis}, s={s_vis})",
)
fig_hist.add_trace(go.Scatter(x=x_grid, y=mielke_pdf(x_grid, k_vis, s_vis), mode="lines", name="pdf"))
fig_hist.update_xaxes(title="x")
fig_hist.update_yaxes(title="density")
fig_hist.show()

# Empirical CDF vs theoretical CDF
x_sorted = np.sort(samples_vis)
ecdf = np.arange(1, len(x_sorted) + 1) / len(x_sorted)

fig_ecdf = go.Figure()
fig_ecdf.add_trace(go.Scatter(x=x_sorted, y=ecdf, mode="lines", name="empirical CDF"))
fig_ecdf.add_trace(go.Scatter(x=x_grid, y=mielke_cdf(x_grid, k_vis, s_vis), mode="lines", name="theoretical CDF"))
fig_ecdf.update_layout(title="Empirical vs theoretical CDF")
fig_ecdf.update_xaxes(type="log", title="x")
fig_ecdf.update_yaxes(title="CDF")
fig_ecdf.show()

9) SciPy Integration (scipy.stats.mielke)#

SciPy provides a full implementation:

  • stats.mielke.pdf, logpdf

  • stats.mielke.cdf, ppf

  • stats.mielke.rvs

  • stats.mielke.fit (MLE)

Remember the Burr relation:

  • stats.mielke(k, s) is equivalent to stats.burr(c=s, d=k/s).

k_true, s_true = 2.5, 3.0

dist = stats.mielke(k_true, s_true)  # loc=0, scale=1 by default
x_eval = np.array([0.5, 1.0, 2.0, 5.0])
print("pdf:", dist.pdf(x_eval))
print("cdf:", dist.cdf(x_eval))

# rvs
data = dist.rvs(size=3000, random_state=rng)
print("sample min/max:", data.min(), data.max())

# fit (fix loc=0, scale=1 to estimate only k and s)
k_hat, s_hat, loc_hat, scale_hat = stats.mielke.fit(data, floc=0, fscale=1)
print("\nTrue (k,s):", (k_true, s_true))
print("Fit  (k,s):", (k_hat, s_hat))
print("Returned loc/scale:", (loc_hat, scale_hat))

# Compare NumPy vs SciPy implementations numerically
x_dense = np.logspace(-3, 3, 1000)
max_pdf_diff = np.max(np.abs(mielke_pdf(x_dense, k_true, s_true) - dist.pdf(x_dense)))
max_cdf_diff = np.max(np.abs(mielke_cdf(x_dense, k_true, s_true) - dist.cdf(x_dense)))
print("\nmax |pdf_numpy - pdf_scipy|:", max_pdf_diff)
print("max |cdf_numpy - cdf_scipy|:", max_cdf_diff)
pdf: [0.712222 0.701539 0.125904 0.003942]
cdf: [0.16025  0.561231 0.906511 0.993382]
sample min/max: 0.025821632419133145 10.725107179530955

True (k,s): (2.5, 3.0)
Fit  (k,s): (2.4983676379870374, 3.0199409878085794)
Returned loc/scale: (0, 1)

max |pdf_numpy - pdf_scipy|: 3.3306690738754696e-16
max |cdf_numpy - cdf_scipy|: 9.992007221626409e-16

10) Statistical Use Cases#

Hypothesis testing#

  • Nested model test: the case \(k=s\) is the log-logistic distribution (fisk). You can test $\(H_0: k=s\ \text{(log-logistic)}\quad\text{vs}\quad H_1: (k,s)\ \text{free}\)$ using a likelihood-ratio test (LRT).

  • Goodness-of-fit: QQ-plots or distribution tests (KS/AD) can be used as diagnostics. Be careful: classical p-values assume parameters are known, while in practice they’re often estimated.

Bayesian modeling#

For positive heavy-tailed data, you can use mielke as a likelihood and place priors on \((k,s)\) (e.g. log-normal or log-uniform). There is no conjugate prior, but posterior inference is straightforward with MCMC or a simple grid approximation in low dimensions.

Generative modeling#

  • Useful as a base distribution for positive heavy-tailed generative models.

  • Can be used in mixtures to model multimodal positive data.

  • The Beta/Beta-prime transformations make it convenient inside larger hierarchical models.

# Likelihood-ratio test example: H0 is log-logistic (fisk), H1 is mielke

c0 = 2.5
x = stats.fisk.rvs(c0, size=1500, random_state=rng)

# Fit under H1 (free k,s) and H0 (fisk shape c). Fix loc=0, scale=1 for simplicity.
k_hat1, s_hat1, _, _ = stats.mielke.fit(x, floc=0, fscale=1)
c_hat0, _, _ = stats.fisk.fit(x, floc=0, fscale=1)

ll1 = np.sum(stats.mielke.logpdf(x, k_hat1, s_hat1))
ll0 = np.sum(stats.fisk.logpdf(x, c_hat0))

lrt_stat = 2 * (ll1 - ll0)
p_value = stats.chi2.sf(lrt_stat, df=1)

print("True H0 (fisk c):", c0)
print("Fit H1 (k,s):", (k_hat1, s_hat1))
print("Fit H0 (fisk c):", c_hat0)
print("LRT stat:", float(lrt_stat))
print("Approx p-value (chi^2_1):", float(p_value))
True H0 (fisk c): 2.5
Fit H1 (k,s): (2.525228570404704, 2.5103231593958144)
Fit H0 (fisk c): 2.5155273437500036
LRT stat: 0.043273791422052454
Approx p-value (chi^2_1): 0.8352105904552609
# Simple Bayesian grid posterior over (k,s) with a log-uniform prior p(k,s) ∝ 1/(k s)
# This is an approximation for intuition (not a replacement for MCMC for serious work).

k_true, s_true = 2.5, 3.0
data = stats.mielke.rvs(k_true, s_true, size=400, random_state=rng)
logx = np.log(data)
sum_logx = logx.sum()
n = data.size

k_grid = np.linspace(0.4, 6.0, 90)
s_grid = np.linspace(1.1, 6.0, 90)  # avoid extremely heavy tails for this demo

log_post = np.empty((k_grid.size, s_grid.size), dtype=float)

for j, s in enumerate(s_grid):
    # sum_i log(1 + x_i^s) computed stably
    s_term = np.logaddexp(0.0, s * logx).sum()

    # vectorized log-likelihood over k_grid
    loglike = n * np.log(k_grid) + (k_grid - 1.0) * sum_logx - (1.0 + k_grid / s) * s_term

    # log-uniform prior over the grid bounds
    logprior = -np.log(k_grid) - np.log(s)
    log_post[:, j] = loglike + logprior

# Normalize on the discrete grid (treating cells as equal-area for visualization)
log_post -= logsumexp(log_post)
post = np.exp(log_post)

i_map, j_map = np.unravel_index(np.argmax(post), post.shape)
print("True (k,s):", (k_true, s_true))
print("MAP  (k,s):", (float(k_grid[i_map]), float(s_grid[j_map])))

fig_post = go.Figure(
    data=go.Contour(
        x=s_grid,
        y=k_grid,
        z=post,
        contours_coloring="heatmap",
        colorbar_title="posterior",
    )
)
fig_post.update_layout(title="Grid posterior p(k,s | data) with log-uniform prior")
fig_post.update_xaxes(title="s")
fig_post.update_yaxes(title="k")
fig_post.show()
True (k,s): (2.5, 3.0)
MAP  (k,s): (2.4134831460674153, 3.0820224719101126)

11) Pitfalls#

  • Invalid parameters: require \(k>0\) and \(s>0\) (and for SciPy location/scale, scale>0).

  • Moment non-existence: mean requires \(s>1\), variance requires \(s>2\), etc. If \(s\le 1\), sample means are unstable and can be misleading.

  • Near-zero behavior: if \(k<1\), the density diverges at 0; this is not a bug.

  • Numerical issues: direct computation of \(x^s\) can overflow for large \(x\) and large \(s\). Prefer log-space identities like log(1+x^s)=logaddexp(0, s log x).

  • Fitting can be delicate: heavy tails can produce extreme outliers; MLE may be sensitive to initialization and may have strong parameter correlation.

12) Summary#

  • mielke is a continuous distribution on \(x>0\) with power-law tails.

  • PDF: \(f(x)=\dfrac{k x^{k-1}}{(1+x^s)^{1+k/s}}\); CDF: \(F(x)=(1+x^{-s})^{-k/s}\).

  • Tail index is \(s\): positive moments exist iff \(n<s\).

  • Closed-form raw moments and entropy are available via Gamma/digamma functions.

  • Sampling is easy via the closed-form quantile function (inverse CDF).

  • In SciPy: scipy.stats.mielke (equivalently scipy.stats.burr(c=s, d=k/s)).